{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# AInsight: AI/ML Weekly News Reporter"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📚 Overview\n",
    "This notebook demonstrates the implementation of an intelligent news aggregation and summarization system using a multi-agent architecture. AInsight automatically collects, processes, and summarizes AI/ML news for general audiences."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Motivation\n",
    "The rapid evolution of AI/ML technology creates several challenges:\n",
    "- Information overload from multiple news sources\n",
    "- Technical complexity making news inaccessible to general audiences\n",
    "- Time-consuming manual curation and summarization\n",
    "- Inconsistent reporting formats and quality\n",
    "\n",
    "AInsight addresses these challenges through:\n",
    "- Automated news collection and filtering\n",
    "- Intelligent summarization for non-technical readers\n",
    "- Consistent, well-structured reporting\n",
    "- Scalable, maintainable architecture\n",
    "- Saving time and effort!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🏗️ Multi-Agent System Architecture\n",
    "\n",
    "AInsight processes news through three specialized agents:\n",
    "\n",
    "1. **NewsSearcher Agent**\n",
    "   - Primary news collection engine\n",
    "   - Interfaces with Tavily API\n",
    "   - Filters for relevance and recency\n",
    "   - Handles source diversity\n",
    "\n",
    "2. **Summarizer Agent**\n",
    "   - Processes technical content\n",
    "   - Uses gpt-4o-mini for natural language generation (LLM can be configured per user preference, used OpenAI in this tutorial for accessibility)\n",
    "   - Handles technical term simplification\n",
    "\n",
    "3. **Publisher Agent**\n",
    "   - Takes list of summaries as input\n",
    "   - Formats them into a structured prompt\n",
    "   - Makes single gpt-4o-mini call to generate complete report with:\n",
    "     * Introduction section\n",
    "     * Organized summaries\n",
    "     * Further reading links\n",
    "   - Saves final report as markdown file"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"text-align: center;\">\n",
    "\n",
    "<img src=\"../images/ainsight_langgraph.svg\" alt=\"ainsight by langgraph\" style=\"width:20%; height:50%;\">\n",
    "</div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🎯 Learning Objectives\n",
    "1. Understand multi-agent system architecture\n",
    "2. Implement state management with LangGraph\n",
    "3. Work with external APIs (Tavily, OpenAI)\n",
    "4. Create modular, maintainable Python code"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 🔧 Technical Requirements\n",
    "- Python 3.11+\n",
    "- OpenAI API key\n",
    "- Tavily API key\n",
    "- Required packages (see setup section)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🚀 Setup and Configuration\n",
    "\n",
    "First, let's install the required packages:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install langchain langchain-openai langgraph tavily-python python-dotenv"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Environment Configuration\n",
    "\n",
    "Create a `.env` file in your project directory with the following:\n",
    "\n",
    "```plaintext\n",
    "OPENAI_API_KEY=your-openai-api-key\n",
    "TAVILY_API_KEY=your-tavily-api-key\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import dependencies\n",
    "import os\n",
    "from typing import Dict, List, Any, TypedDict, Optional\n",
    "from datetime import datetime\n",
    "from pydantic import BaseModel\n",
    "from dotenv import load_dotenv\n",
    "from tavily import TavilyClient\n",
    "from langchain_openai import ChatOpenAI\n",
    "from langchain_core.messages import HumanMessage, SystemMessage\n",
    "from langgraph.graph import StateGraph\n",
    "\n",
    "# Load environment variables\n",
    "load_dotenv()\n",
    "\n",
    "# Initialize API clients\n",
    "tavily = TavilyClient(api_key=os.getenv(\"TAVILY_API_KEY\"))\n",
    "llm = ChatOpenAI(\n",
    "    model=\"gpt-4o-mini\",\n",
    "    temperature=0.1,\n",
    "    max_tokens=600\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📊 Data Models and State Management\n",
    "\n",
    "Think of state as a \"memory\" that will flow through your workflow (graph) later.\n",
    "\n",
    "We use Pydantic and TypedDict to define our data structures:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Article(BaseModel):\n",
    "    \"\"\"\n",
    "    Represents a single news article\n",
    "    \n",
    "    Attributes:\n",
    "        title (str): Article headline\n",
    "        url (str): Source URL\n",
    "        content (str): Article content\n",
    "    \"\"\"\n",
    "    title: str\n",
    "    url: str\n",
    "    content: str\n",
    "\n",
    "class Summary(TypedDict):\n",
    "    \"\"\"\n",
    "    Represents a processed article summary\n",
    "    \n",
    "    Attributes:\n",
    "        title (str): Original article title\n",
    "        summary (str): Generated summary\n",
    "        url (str): Source URL for reference\n",
    "    \"\"\"\n",
    "    title: str\n",
    "    summary: str\n",
    "    url: str\n",
    "\n",
    "# This defines what information we can store and pass between nodes later\n",
    "class GraphState(TypedDict):\n",
    "    \"\"\"\n",
    "    Maintains workflow state between agents\n",
    "    \n",
    "    Attributes:\n",
    "        articles (Optional[List[Article]]): Found articles\n",
    "        summaries (Optional[List[Summary]]): Generated summaries\n",
    "        report (Optional[str]): Final compiled report\n",
    "    \"\"\"\n",
    "    articles: Optional[List[Article]] \n",
    "    summaries: Optional[List[Summary]] \n",
    "    report: Optional[str] "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🤖 Agent Implementation\n",
    "\n",
    "### 1. NewsSearcher Agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class NewsSearcher:\n",
    "    \"\"\"\n",
    "    Agent responsible for finding relevant AI/ML news articles\n",
    "    using the Tavily search API\n",
    "    \"\"\"\n",
    "    \n",
    "    def search(self) -> List[Article]:\n",
    "        \"\"\"\n",
    "        Performs news search with configured parameters\n",
    "        \n",
    "        Returns:\n",
    "            List[Article]: Collection of found articles\n",
    "        \"\"\"\n",
    "        response = tavily.search(\n",
    "            query=\"artificial intelligence and machine learning news\", \n",
    "            topic=\"news\",\n",
    "            time_period=\"1w\",\n",
    "            search_depth=\"advanced\",\n",
    "            max_results=5\n",
    "        )\n",
    "        \n",
    "        articles = []\n",
    "        for result in response['results']:\n",
    "            articles.append(Article(\n",
    "                title=result['title'],\n",
    "                url=result['url'],\n",
    "                content=result['content']\n",
    "            ))\n",
    "        \n",
    "        return articles"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. Summarizer Agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Summarizer:\n",
    "    \"\"\"\n",
    "    Agent that processes articles and generates accessible summaries\n",
    "    using gpt-4o-mini\n",
    "    \"\"\"\n",
    "    \n",
    "    def __init__(self):\n",
    "        self.system_prompt = \"\"\"\n",
    "        You are an AI expert who makes complex topics accessible \n",
    "        to general audiences. Summarize this article in 2-3 sentences, focusing on the key points \n",
    "        and explaining any technical terms simply.\n",
    "        \"\"\"\n",
    "    \n",
    "    def summarize(self, article: Article) -> str:\n",
    "        \"\"\"\n",
    "        Generates an accessible summary of a single article\n",
    "        \n",
    "        Args:\n",
    "            article (Article): Article to summarize\n",
    "            \n",
    "        Returns:\n",
    "            str: Generated summary\n",
    "        \"\"\"\n",
    "        response = llm.invoke([\n",
    "            SystemMessage(content=self.system_prompt),\n",
    "            HumanMessage(content=f\"Title: {article.title}\\n\\nContent: {article.content}\")\n",
    "        ])\n",
    "        return response.content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. Publisher Agent"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Publisher:\n",
    "    \"\"\"\n",
    "    Agent that compiles summaries into a formatted report \n",
    "    and saves it to disk\n",
    "    \"\"\"\n",
    "    \n",
    "    def create_report(self, summaries: List[Dict]) -> str:\n",
    "        \"\"\"\n",
    "        Creates and saves a formatted markdown report\n",
    "        \n",
    "        Args:\n",
    "            summaries (List[Dict]): Collection of article summaries\n",
    "            \n",
    "        Returns:\n",
    "            str: Generated report content\n",
    "        \"\"\"\n",
    "        prompt = \"\"\"\n",
    "        Create a weekly AI/ML news report for the general public. \n",
    "        Format it with:\n",
    "        1. A brief introduction\n",
    "        2. The main news items with their summaries\n",
    "        3. Links for further reading\n",
    "        \n",
    "        Make it engaging and accessible to non-technical readers.\n",
    "        \"\"\"\n",
    "        \n",
    "        # Format summaries for the LLM\n",
    "        summaries_text = \"\\n\\n\".join([\n",
    "            f\"Title: {item['title']}\\nSummary: {item['summary']}\\nSource: {item['url']}\"\n",
    "            for item in summaries\n",
    "        ])\n",
    "        \n",
    "        # Generate report\n",
    "        response = llm.invoke([\n",
    "            SystemMessage(content=prompt),\n",
    "            HumanMessage(content=summaries_text)\n",
    "        ])\n",
    "        \n",
    "        # Add metadata and save\n",
    "        current_date = datetime.now().strftime(\"%Y-%m-%d\")\n",
    "        markdown_content = f\"\"\"\n",
    "        Generated on: {current_date}\n",
    "\n",
    "        {response.content}\n",
    "        \"\"\"\n",
    "        \n",
    "        filename = f\"ai_news_report_{current_date}.md\"\n",
    "        with open(filename, 'w') as f:\n",
    "            f.write(markdown_content)\n",
    "        \n",
    "        return response.content"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🔄 Workflow Implementation\n",
    "\n",
    "### State Management Nodes"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can think of nodes as the \"workers\" (aka agents) in your workflow. Each node:\n",
    "\n",
    "1. Takes the current state\n",
    "2. Processes it\n",
    "3. Returns updated state\n",
    "\n",
    "For example the node of NewsSearcher agent:  \n",
    "\n",
    "1. Takes current state (empty at first)\n",
    "2. Searches for articles\n",
    "3. Updates state with found articles\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "def search_node(state: Dict[str, Any]) -> Dict[str, Any]:\n",
    "    \"\"\"\n",
    "    Node for article search\n",
    "    \n",
    "    Args:\n",
    "        state (Dict[str, Any]): Current workflow state\n",
    "        \n",
    "    Returns:\n",
    "        Dict[str, Any]: Updated state with found articles\n",
    "    \"\"\"\n",
    "    searcher = NewsSearcher()\n",
    "    state['articles'] = searcher.search() \n",
    "    return state\n",
    "\n",
    "def summarize_node(state: Dict[str, Any]) -> Dict[str, Any]:\n",
    "    \"\"\"\n",
    "    Node for article summarization\n",
    "    \n",
    "    Args:\n",
    "        state (Dict[str, Any]): Current workflow state\n",
    "        \n",
    "    Returns:\n",
    "        Dict[str, Any]: Updated state with summaries\n",
    "    \"\"\"\n",
    "    summarizer = Summarizer()\n",
    "    state['summaries'] = []\n",
    "    \n",
    "    for article in state['articles']: # Uses articles from previous node\n",
    "        summary = summarizer.summarize(article)\n",
    "        state['summaries'].append({\n",
    "            'title': article.title,\n",
    "            'summary': summary,\n",
    "            'url': article.url\n",
    "        })\n",
    "    return state\n",
    "\n",
    "def publish_node(state: Dict[str, Any]) -> Dict[str, Any]:\n",
    "    \"\"\"\n",
    "    Node for report generation\n",
    "    \n",
    "    Args:\n",
    "        state (Dict[str, Any]): Current workflow state\n",
    "        \n",
    "    Returns:\n",
    "        Dict[str, Any]: Updated state with final report\n",
    "    \"\"\"\n",
    "    publisher = Publisher()\n",
    "    report_content = publisher.create_report(state['summaries'])\n",
    "    state['report'] = report_content\n",
    "    return state"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Workflow Graph Creation"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def create_workflow() -> StateGraph:\n",
    "    \"\"\"\n",
    "    Constructs and configures the workflow graph\n",
    "    search -> summarize -> publish\n",
    "    \n",
    "    Returns:\n",
    "        StateGraph: Compiled workflow ready for execution\n",
    "    \"\"\"\n",
    "    \n",
    "    # Create a workflow (graph) initialized with our state schema\n",
    "    workflow = StateGraph(state_schema=GraphState)\n",
    "    \n",
    "    # Add processing nodes that we will flow between\n",
    "    workflow.add_node(\"search\", search_node)\n",
    "    workflow.add_node(\"summarize\", summarize_node)\n",
    "    workflow.add_node(\"publish\", publish_node)\n",
    "    \n",
    "    # Define the flow with edges\n",
    "    workflow.add_edge(\"search\", \"summarize\") # search results flow to summarizer\n",
    "    workflow.add_edge(\"summarize\", \"publish\") # summaries flow to publisher\n",
    "    \n",
    "    # Set where to start\n",
    "    workflow.set_entry_point(\"search\")\n",
    "    \n",
    "    return workflow.compile()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🎬 Usage Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "=== AI/ML Weekly News Report ===\n",
      "\n",
      "# Weekly AI/ML News Report: November 15, 2024\n",
      "\n",
      "Welcome to this week's roundup of exciting developments in the world of artificial intelligence and machine learning! From educational initiatives to groundbreaking partnerships, the AI landscape is buzzing with innovation. Let’s dive into the highlights!\n",
      "\n",
      "## Key News Items\n",
      "\n",
      "### 1. Microsoft and OneLake Collaboration\n",
      "**Source:** [Solutions Review](https://solutionsreview.com/artificial-intelligence-news-for-the-week-of-november-15-updates-from-amd-ibm-openai-more/)  \n",
      "Microsoft has announced a new collaboration that enhances data management through its OneLake platform, part of Microsoft Fabric. This partnership aims to bolster the infrastructure needed to support AI and machine learning tasks, addressing the growing demand for robust data solutions in enterprise tech. Stay tuned for more updates and resources related to AI discussions in the tech community!\n",
      "\n",
      "### 2. Mississippi State University’s AI Training Program\n",
      "**Source:** [Government Technology](https://www.govtech.com/education/higher-ed/mississippi-state-to-teach-students-to-build-train-ai-systems)  \n",
      "Mississippi State University has secured a $1.2 million grant from the National Science Foundation to train 60 students in building and training AI systems focused on analyzing digital images. This program, in collaboration with 15 high school teachers, will provide students with hands-on experience in preparing image data and developing smart devices, enhancing their skills in intelligent vision tasks.\n",
      "\n",
      "### 3. Machine Learning and Gut Health\n",
      "**Source:** [Genetic Engineering & Biotechnology News](https://www.genengnews.com/topics/artificial-intelligence/machine-learning-reveals-impact-of-microbial-load-on-gut-health-and-disease/)  \n",
      "Researchers at EMBL Heidelberg have developed a machine learning model that estimates the density of microbes in the gut, known as microbial load, using only microbial composition data. This innovative approach could revolutionize how scientists study the gut microbiome, which is vital for understanding gut health and disease, allowing for more efficient analyses without additional experimental tests.\n",
      "\n",
      "### 4. OpenAI and Estée Lauder Partnership\n",
      "**Source:** [The Business of Fashion](https://www.businessoffashion.com/news/beauty/openai-partners-with-estee-lauder-on-rd/)  \n",
      "OpenAI has teamed up with Estée Lauder to provide employees access to over 240 advanced AI tools, known as generative pre-trained transformers (GPTs). This collaboration aims to assist in the development and marketing of new beauty products, showcasing the increasing integration of AI in the fashion and beauty sectors, especially as brands look to innovate during a slowdown in luxury sales.\n",
      "\n",
      "### 5. New Company for Healthcare Innovation\n",
      "**Source:** [Washington Technology](https://www.washingtontechnology.com/companies/2024/11/private-equity-firm-creates-company-harness-tech-societal-good/401016/?oref=wt-homepage-river)  \n",
      "Gavin Long's Pleasant Land group\n"
     ]
    }
   ],
   "source": [
    "if __name__ == \"__main__\":\n",
    "    # Initialize and run workflow\n",
    "    workflow = create_workflow()\n",
    "    final_state = workflow.invoke({\n",
    "        \"articles\": None,\n",
    "        \"summaries\": None,\n",
    "        \"report\": None\n",
    "    })\n",
    "    \n",
    "    # Display results\n",
    "    print(\"\\n=== AI/ML Weekly News Report ===\\n\")\n",
    "    print(final_state['report'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 📝 Customization Options\n",
    "\n",
    "1. Modify search parameters in `NewsSearcher`:\n",
    "   - `search_depth`: \"basic\" or \"advanced\"\n",
    "   - `max_results`: Number of articles to fetch\n",
    "   - `time_period`: \"1d\", \"1w\", \"1m\", etc.\n",
    "\n",
    "2. Adjust summarization in `Summarizer`:\n",
    "   - Update `system_prompt` for different summary styles\n",
    "   - Modify GPT model parameters (temperature, max_tokens)\n",
    "\n",
    "3. Customize report format in `Publisher`:\n",
    "   - Edit the report prompt for different layouts\n",
    "   - Modify markdown template"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🤔 Additional Considerations\n",
    "\n",
    "### Current Limitations\n",
    "\n",
    "1. **Content Access**\n",
    "   - Limited to publicly available news\n",
    "   - Dependency on Tavily API coverage\n",
    "   - No access to paywalled content\n",
    "\n",
    "2. **Language Support**\n",
    "   - Primary focus on English content\n",
    "\n",
    "3. **Technical Constraints**\n",
    "   - API rate limits\n",
    "\n",
    "### Potential Improvements\n",
    "\n",
    "1. **Enhanced News Collection**\n",
    "   - Specify domains to search on depending on user preference\n",
    "\n",
    "2. **Improved Summarization**\n",
    "   - Add multi-language support\n",
    "   - Implement fact-checking\n",
    "\n",
    "3. **Advanced Features**\n",
    "   - Topic classification\n",
    "   - Trend detection\n",
    "\n",
    "### Specific Use Cases\n",
    "\n",
    "1. **Research Organizations**\n",
    "   - Track technology developments\n",
    "   - Monitor competition\n",
    "   - Identify collaboration opportunities\n",
    "\n",
    "2. **Educational Institutions**\n",
    "   - Create teaching materials\n",
    "   - Support student research\n",
    "   - Track field developments\n",
    "\n",
    "3. **Tech Companies**\n",
    "   - Market intelligence\n",
    "   - Innovation tracking\n",
    "   - Strategic planning\n",
    "\n",
    "4. **Media Organizations**\n",
    "   - Content curation\n",
    "   - Story research\n",
    "   - Trend analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 🔍 Troubleshooting\n",
    "\n",
    "1. API Key Issues:\n",
    "   - Ensure `.env` file exists and contains valid keys\n",
    "   - Check API key permissions and quotas\n",
    "\n",
    "2. Package Dependencies:\n",
    "   - Run `pip list` to verify installations\n",
    "   - Check package versions for compatibility\n",
    "\n",
    "3. Rate Limits:\n",
    "   - Monitor API usage\n",
    "   - Implement retry logic if needed"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References:\n",
    "- Tavily search API doc: https://docs.tavily.com/docs/rest-api/api-reference\n",
    "- LangGraph Conceptual Guides: https://langchain-ai.github.io/langgraph/concepts/low_level/\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
